fundus photograph
APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs
Liu, Bowen, Zhang, Weiyi, Chotcomwongse, Peranut, Chen, Xiaolan, Chen, Ruoyu, Pakaymaskul, Pawin, Arjkongharn, Niracha, Vongsa, Nattaporn, Cheng, Xuelian, Ge, Zongyuan, Huang, Kun, Li, Xiaohui, Duan, Yiru, Wang, Zhenbang, Xie, BaoYe, Chen, Qiang, Fu, Huazhu, Mahr, Michael A., Qu, Jiaqi, Chen, Wangyiyang, Wang, Shiye, Tan, Yubo, Li, Yongjie, He, Mingguang, Shi, Danli, Ruamviboonsuk, Paisan
Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with less dependence on expensive devices. Although generative artificial intelligence has demonstrated promising results in medical image synthesis, translating 2D fundus images into 3D OCT images presents unique challenges due to inherent differences in data dimensionality and biological information between modalities. To advance generative models in the fundus-to-3D-OCT setting, the Asia Pacific Tele-Ophthalmology Society (APTOS-2024) organized a challenge titled Artificial Intelligence-based OCT Generation from Fundus Images. This paper details the challenge framework (referred to as APTOS-2024 Challenge), including: the benchmark dataset, evaluation methodology featuring two fidelity metrics-image-based distance (pixel-level OCT B-scan similarity) and video-based distance (semantic-level volumetric consistency), and analysis of top-performing solutions. The challenge attracted 342 participating teams, with 42 preliminary submissions and 9 finalists. Leading methodologies incorporated innovations in hybrid data preprocessing or augmentation (cross-modality collaborative paradigms), pre-training on external ophthalmic imaging datasets, integration of vision foundation models, and model architecture improvement. The APTOS-2024 Challenge is the first benchmark demonstrating the feasibility of fundus-to-3D-OCT synthesis as a potential solution for improving ophthalmic care accessibility in under-resourced healthcare settings, while helping to expedite medical research and clinical applications.
Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning
Huang, Yuqing, Wittmann, Bastian, Demler, Olga, Menze, Bjoern, Davoudi, Neda
Early identification of stroke is crucial for intervention, requiring reliable models. We proposed an efficient retinal image representation together with clinical information to capture a comprehensive overview of cardiovascular health, leveraging large multimodal datasets for new medical insights. Our approach is one of the first contrastive frameworks that integrates graph and tabular data, using vessel graphs derived from retinal images for efficient representation. This method, combined with multimodal contrastive learning, significantly enhances stroke prediction accuracy by integrating data from multiple sources and using contrastive learning for transfer learning. The self-supervised learning techniques employed allow the model to learn effectively from unlabeled data, reducing the dependency on large annotated datasets. Our framework showed an AUROC improvement of 3.78% from supervised to self-supervised approaches. Additionally, the graph-level representation approach achieved superior performance to image encoders while significantly reducing pre-training and fine-tuning runtimes. These findings indicate that retinal images are a cost-effective method for improving cardiovascular disease predictions and pave the way for future research into retinal and cerebral vessel connections and the use of graph-based retinal vessel representations.
Integrating Deep Learning with Fundus and Optical Coherence Tomography for Cardiovascular Disease Prediction
Maldonado-Garcia, Cynthia, Zakeri, Arezoo, Frangi, Alejandro F, Ravikumar, Nishant
Early identification of patients at risk of cardiovascular diseases (CVD) is crucial for effective preventive care, reducing healthcare burden, and improving patients' quality of life. This study demonstrates the potential of retinal optical coherence tomography (OCT) imaging combined with fundus photographs for identifying future adverse cardiac events. We used data from 977 patients who experienced CVD within a 5-year interval post-image acquisition, alongside 1,877 control participants without CVD, totaling 2,854 subjects. We propose a novel binary classification network based on a Multi-channel Variational Autoencoder (MCVAE), which learns a latent embedding of patients' fundus and OCT images to classify individuals into two groups: those likely to develop CVD in the future and those who are not. Our model, trained on both imaging modalities, achieved promising results (AUROC 0.78 +/- 0.02, accuracy 0.68 +/- 0.002, precision 0.74 +/- 0.02, sensitivity 0.73 +/- 0.02, and specificity 0.68 +/- 0.01), demonstrating its efficacy in identifying patients at risk of future CVD events based on their retinal images. This study highlights the potential of retinal OCT imaging and fundus photographs as cost-effective, non-invasive alternatives for predicting cardiovascular disease risk. The widespread availability of these imaging techniques in optometry practices and hospitals further enhances their potential for large-scale CVD risk screening. Our findings contribute to the development of standardized, accessible methods for early CVD risk identification, potentially improving preventive care strategies and patient outcomes.
VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis
Wei, Hao, Liu, Bowen, Zhang, Minqing, Shi, Peilun, Yuan, Wu
Generalist foundation model has ushered in newfound capabilities in medical domain. However, the contradiction between the growing demand for high-quality annotated data with patient privacy continues to intensify. The utilization of medical artificial intelligence generated content (Med-AIGC) as an inexhaustible resource repository arises as a potential solution to address the aforementioned challenge. Here we harness 1 million open-source synthetic fundus images paired with natural language descriptions, to curate an ethical language-image foundation model for retina image analysis named VisionCLIP. VisionCLIP achieves competitive performance on three external datasets compared with the existing method pre-trained on real-world data in a zero-shot fashion. The employment of artificially synthetic images alongside corresponding textual data for training enables the medical foundation model to successfully assimilate knowledge of disease symptomatology, thereby circumventing potential breaches of patient confidentiality.
HyMNet: a Multimodal Deep Learning System for Hypertension Classification using Fundus Photographs and Cardiometabolic Risk Factors
Baharoon, Mohammed, Almatar, Hessa, Alduhayan, Reema, Aldebasi, Tariq, Alahmadi, Badr, Bokhari, Yahya, Alawad, Mohammed, Almazroa, Ahmed, Aljouie, Abdulrhman
In recent years, deep learning has shown promise in predicting hypertension (HTN) from fundus images. However, most prior research has primarily focused on analyzing a single type of data, which may not capture the full complexity of HTN risk. To address this limitation, this study introduces a multimodal deep learning (MMDL) system, dubbed HyMNet, which combines fundus images and cardiometabolic risk factors, specifically age and gender, to improve hypertension detection capabilities. Our MMDL system uses the DenseNet-201 architecture, pre-trained on ImageNet, for the fundus imaging path and a fully connected neural network for the age and gender path. The two paths are jointly trained by concatenating 64 features output from each path that are then fed into a fusion network. The system was trained on 1,143 retinal images from 626 individuals collected from the Saudi Ministry of National Guard Health Affairs. The results show that the multimodal model that integrates fundus images along with age and gender achieved an AUC of 0.791 [CI: 0.735, 0.848], which outperforms the unimodal model trained solely on fundus photographs that yielded an AUC of 0.766 [CI: 0.705, 0.828] for hypertension detection. Abbreviations BP, blood pressure; CVD, cardiovascular disease; EHR, electronic health record; EMR, electronic medical records; AI, artificial intelligence; DL, deep learning; MMDL, multimodal deep learning; SVM, support vector machine; FCNN, fully connected neural network; CNN convolutional neural network; ReLU; rectified linear unit; AUC, area under the operating characteristic curve, PR, area under the precision-recall curve; CI, confidence interval; MAE, mean absolute error; KAIMRC, King Abdullah International Medical Research Center. Keywords Artificial Intelligence; Machine Learning; Computer Vision. 1. Introduction Cardiovascular diseases persist as one of the primary causes of mortality worldwide, with hypertension, or high blood pressure (BP), serving as a significant contributing risk factor (1,2).
Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers
Adak, Chandranath, Karkera, Tejas, Chattopadhyay, Soumi, Saqib, Muhammad
Diabetic Retinopathy (DR) is considered one of the primary concerns due to its effect on vision loss among most people with diabetes globally. The severity of DR is mostly comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this paper, we adopt transformer-based learning models to capture the crucial features of retinal images to understand DR severity better. We work with ensembling image transformers, where we adopt four models, namely ViT (Vision Transformer), BEiT (Bidirectional Encoder representation for image Transformer), CaiT (Class-Attention in Image Transformers), and DeiT (Data efficient image Transformers), to infer the degree of DR severity from fundus photographs. For experiments, we used the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.
Estimation of best corrected visual acuity based on deep neural network - Scientific Reports
In this study, we investigated a convolutional neural network (CNN)-based framework for the estimation of the best-corrected visual acuity (BCVA) from fundus images. First, we collected 53,318 fundus photographs from the Gyeongsang National University Changwon Hospital, where each fundus photograph is categorized into 11 levels by retrospective medical chart review. Then, we designed 4 BCVA estimation schemes using transfer learning with pre-trained ResNet-18 and EfficientNet-B0 models where both regression and classification-based prediction are taken into account. According to the results of the study, the predicted BCVA by CNN-based schemes is close to the actual value such that 94.37% of prediction accuracy can be achieved when 3 levels of difference can be tolerated during prediction. The mean squared error and $$R^2$$ score were measured as 0.028 and 0.654, respectively. These results indicate that the BCVA can be predicted accurately for extreme cases, i.e., the level of BCVA is close to either 0.0 or 1.0. Moreover, using the Guided Grad-CAM, we confirmed that the macula and the blood vessel surrounding the macula are mainly utilized in the prediction of BCVA, which validates the rationality of the CNN-based BCVA estimation schemes since the same area is also exploited during the retrospective medical chart review. Finally, we applied the t-distributed stochastic neighbor embedding to examine the characteristics of CNN-based BCVA estimation schemes. The developed BCVA estimation schemes can be employed to obtain the objective measurement of BVCA as well as the medical screening of people with poor access to medical care through smartphone-based fundus imaging.
A ResNet is All You Need? Modeling A Strong Baseline for Detecting Referable Diabetic Retinopathy in Fundus Images
Castilla, Tomás, Martínez, Marcela S., Leguía, Mercedes, Larrabide, Ignacio, Orlando, José Ignacio
Deep learning is currently the state-of-the-art for automated detection of referable diabetic retinopathy (DR) from color fundus photographs (CFP). While the general interest is put on improving results through methodological innovations, it is not clear how good these approaches perform compared to standard deep classification models trained with the appropriate settings. In this paper we propose to model a strong baseline for this task based on a simple and standard ResNet-18 architecture. To this end, we built on top of prior art by training the model with a standard preprocessing strategy but using images from several public sources and an empirically calibrated data augmentation setting. To evaluate its performance, we covered multiple clinically relevant perspectives, including image and patient level DR screening, discriminating responses by input quality and DR grade, assessing model uncertainties and analyzing its results in a qualitative manner. With no other methodological innovation than a carefully designed training, our ResNet model achieved an AUC = 0.955 (0.953 - 0.956) on a combined test set of 61007 test images from different public datasets, which is in line or even better than what other more complex deep learning models reported in the literature. Similar AUC values were obtained in 480 images from two separate in-house databases specially prepared for this study, which emphasize its generalization ability. This confirms that standard networks can still be strong baselines for this task if properly trained.
Multimodal Information Fusion for Glaucoma and DR Classification
Li, Yihao, Daho, Mostafa El Habib, Conze, Pierre-Henri, Hajj, Hassan Al, Bonnin, Sophie, Ren, Hugang, Manivannan, Niranchana, Magazzeni, Stephanie, Tadayoni, Ramin, Cochener, Béatrice, Lamard, Mathieu, Quellec, Gwenolé
Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments. In recent years, multiple imaging techniques have been used in clinical practice for retinal analysis: 2D fundus photographs, 3D optical coherence tomography (OCT) and 3D OCT angiography, etc. Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks: early fusion, intermediate fusion, and hierarchical fusion. The commonly used early and intermediate fusions are simple but do not fully exploit the complementary information between modalities. We developed a hierarchical fusion approach that focuses on combining features across multiple dimensions of the network, as well as exploring the correlation between modalities. These approaches were applied to glaucoma and diabetic retinopathy classification, using the public GAMMA dataset (fundus photographs and OCT) and a private dataset of PlexElite 9000 (Carl Zeis Meditec Inc.) OCT angiography acquisitions, respectively. Our hierarchical fusion method performed the best in both cases and paved the way for better clinical diagnosis.
Deep learning algorithm shows accuracy in detecting glaucoma on fundus photographs
Automated deep learning analysis of fundus photographs showed high diagnostic accuracy in determining primary open-angle glaucoma, with increased ability to detect glaucoma earlier than human readers. A deep learning (DL) algorithm was trained, validated and tested on the fundus stereophotographs of participants enrolled in the Ocular Hypertension Treatment Study (OHTS), a randomized clinical trial evaluating the safety and efficacy of IOP-lowering medications in preventing progression from ocular hypertension to primary open-angle glaucoma (POAG). Assessment of optic disc and visual field changes in the OHTS was performed by two reading centers and a masked committee of glaucoma specialists, "a demanding, laborious and complicated task," according to the authors. The OHTS data set consisted of fundus photographs from 1,636 participants, of which 1,147 were included in the training set, 167 in the validation set and 322 in the test set. The DL model detected conversion to POAG with high diagnostic accuracy, suggesting that artificial intelligence can offer a reliable tool to automate the determination of glaucoma for clinical trial management, simplifying the process of human interpretation and, possibly, making it more standardized, objective and accurate.